Female Employment

This is my blog site for my DH140 Final Project
Author

Pavana Atawale

Published

August 4, 2023

Modified

August 4, 2023

Female Employment

Pavana Atawale

August 4, 2023

Introduction

I decided to focus on the topic of discrimination on the basis of sex, specifically in the professional sector. As a woman who is involved in the Computer Science field, I am extremely aware of the discrepancies between how my male peers (in academia or industry) are treated compared to myself and my female peers. I have heard many horror stories from other women in my life about their professional experiences being hampered by their gender. I wondered if this was a general trend, or only in the CS field. I decided to use this project as a way for me to explore not only the presence of this discrimination, but also the possible causes of these discrepancies in the way women are treated at work.

I am aiming to answer the following question: What factors play a role in the differences in female and male employment?

This project is built upon the project I worked on for a previous class I took at UCLA (DH 101). For that project, we used the same dataset, but focused on a broad range of questions relating to this topic. For this project, I chose to focus more specifically on the question that most interested me.

In my previous class, we were inspired by the dataset we were assigned, specifically the entrepreneurship section. However, as we progressed, I found myself becoming more fascinated with the broader issue of gender-based discrimination in employment. In countries throughout the world, employment is affected by a range of factors. Throughout this project, I attempted to uncover some of the factors that influence female employment.

Methods

For my project, I utilized data from The World Bank’s Gender Data Portal which presents gender statistics, primarily from the United Nations and UNESCO. Its aim is to enhance comprehension of gender data and enable analyses that can inform policy decisions.

The dataset relies upon census data from member countries, surveys from other organizations, and existing research done by the United Nations. To streamline and standardize their data collection methods, the World Bank works with international organizations like the United Nations to adhere to strict standards of measurement.

The dataset contains various thematic indicators, each having entries for a variety of countries across the world. Each country also has entries for different years, ranging from 1970 to 2022. The indicators are split into different categories, such as education, leadership, etc. For this project, I decided to focus on the topics of employment and health.

My process for this project was as follows: * I imported, cleaned, and sorted through the datasets. * I explored the different indicators, countries, and years. * I did so by printing all the different indicators, and filtering through them * Then, I identified important indicators that could answer my question * I created visualizations to illustrate the connections between indicators, countries, and years. * I analyzed the visualizations, and used them to answer my question

I also found a GeoJSON file from the World Bank. I did have to edit that file separately, in order to make the names on the JSON file match the names in my datasets. This was done manually on my computer, before I uploaded it to be used for the mapping here.

Results

Data Exploration

#Import statements
import pandas as pd
import matplotlib.pyplot as plt
import folium
import plotly
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook_connected"
import plotly.express as px
from plotly.subplots import make_subplots
import numpy as np
#Import datasets from github repository

employ_df = pd.read_csv('https://raw.githubusercontent.com/patawale/DH140FinalProject/main/Employment.csv')
employ_df.columns = ['Indicator Name', 'Indicator Code', 'Country Name', 'Country Code','Year', 'Value']
employ_df = employ_df[['Indicator Name','Country Name', 'Year', 'Value']].copy()

health_df = pd.read_csv('https://raw.githubusercontent.com/patawale/DH140FinalProject/main/Health.csv')
health_df = health_df[['Indicator Name','Country Name', 'Year', 'Value']].copy()

employ_ind = employ_df['Indicator Name'].unique()
health_ind = health_df['Indicator Name'].unique()

geo = "https://raw.githubusercontent.com/patawale/DH140FinalProject/main/WorldBank.geojson"

Data Analysis

Female and Male Employment in Agriculture, Industry, and Services

“How does female and male employment differ based on business sector?”

Whether or not a woman is employed is not the only factor I am considering when looking at the differences in female and male employment. The industry in which a woman is employed is also important. If one with no prior knowledge of this topic were to consider the question above, they might answer that there is no difference, and women and men are employed in all sectors. However, based on my previous experience that might not be entirely true. In the following graph, I explore female and male employment in three different business sectors (agriculture, industry, and service), based on income status.

#List of 'Country Name' values that involve groups of countries in different income classes
incomes = ['Low income', 'Lower middle income', 'MIC', 'Upper middle income', 'High income']

#Filter dataset to only include relevant data
df = employ_df[employ_df['Indicator Name'].str.contains('Employment in ')]
df = df[~df['Indicator Name'].str.contains('total employment')]
df = df[df['Country Name'].isin(incomes)]
df = df.replace("MIC", "Middle income")

#Create and display a bar graph using plotly
fig = px.bar(df, x="Country Name", y="Value",
           color='Indicator Name', hover_name='Indicator Name', barmode='group',
           animation_frame="Year",
          labels={
                     "Country Name": "Income level",
                     "Indicator Name": "Industry and Gender Descriptor",
                     "Value": "% of Employment"
                 },)
fig.update_layout(legend=dict(
    orientation="h",
    yanchor="bottom",
    y=1.02,
    xanchor="right",
    x=1
))
fig.show()

This visualization contrasts the percentage of men and women employed in three different sectors of business: agriculture, industry, and services. The data was collected from different countries around the world, grouped by income status. From this graph, it can be seen that in every income group, the percentage of men in each business sector surpasses the percentage of women in the same sector. Thus, in total, men are present in business more than women. More specifically, we can see patterns forming, based on the women in different income groups. In higher-income groups, women are most likely to be working in the service industry. However, in lower-income groups, women are most likely to be working in the agriculture industry. Thus, we can see that income influences the type of employment that women achieve.

Overall, we can clearly see that women, regardless of income class, are more likely to be working in the service industry, than men in the same income class. On the other hand, regardless of income, men are more likely to be working in industry. Thus, regardless of income, there are still differences in the business sector that men and women are employed in.

Note: to clarify, this data was collected by grouping countries into income levels (low income to high income), then reporting the compiled data from each of these countries. It does not account for income disparities within a country.

Effects of Maternity Leave on Female Labor Force

The first major issue I can think of that might have an impact on the differences in female and male employment is maternity leave. Whether or not a woman is supported during her maternity leave would have a notable effect on her willingness and ability to rejoin the labor force. In the following graph, I explore the effects of the length of maternity leave on female participation in the labor force.

# Filter dataset to only include relevant data
df1 = employ_df[employ_df['Indicator Name'] == 'Length of paid maternity leave (calendar days)']
df1 = df1[df1['Year'] == 2022]
df1 = df1[['Country Name', 'Value']].copy()
#print(df1.sort_values(by=['Value']))
# Filter dataset to only include relevant data
# Note: I had to do a slightly more complicated than regular way of filtering, 
# because I needed to get the latest data from each country.

indices = []

prev_country = ""
for index, row in employ_df.iterrows():
  indic = row['Indicator Name']
  country = row['Country Name']
  if(indic == 'Labor force participation rate, female (% of female population ages 15+) (national estimate)'):
    if (country != prev_country):
      indices.append(index)
      prev_country = country
    
df2 = employ_df.iloc[indices]
df2 = df2[['Country Name', 'Value']].copy()

#print(df)
#print(df2.sort_values(by=['Value']).to_string())
#Create and display both maps
def addTooltip(m):
  style_function = lambda x: {'fillColor': '#ffffff', 
                            'color':'#000000', 
                            'fillOpacity': 0.1, 
                            'weight': 0.1}
  highlight_function = lambda x: {'fillColor': '#000000', 
                                  'color':'#000000', 
                                  'fillOpacity': 0.50, 
                                  'weight': 0.1}
  NIL = folium.features.GeoJson(
      geo,
      style_function=style_function, 
      control=False,
      highlight_function=highlight_function, 
      tooltip=folium.features.GeoJsonTooltip(
          fields=['NAME_EN','INCOME_GRP'],  # use fields from the json file
          aliases=['Country: ','Income: '],
          style=("background-color: white; color: #333333; font-family: arial; font-size: 12px; padding: 10px;") 
      )
  )
  m.add_child(NIL)
  m.keep_in_front(NIL)
  folium.LayerControl().add_to(m)

m = folium.Map(location=[40,0], zoom_start=2)
folium.Choropleth(
    geo_data=geo,
    data=df1,
    columns=["Country Name", "Value"],
    key_on="feature.properties.NAME_EN",
    legend_name="There is paid parental leave (1=yes; 0=no)",
    fill_color="Reds",
).add_to(m)
addTooltip(m)
display(m)

m = folium.Map(location=[40,0], zoom_start=2)
folium.Choropleth(
    geo_data=geo,
    data=df2,
    columns=["Country Name", "Value"],
    legend_name="Female Labor Force Participation Rate (%)",
    key_on="feature.properties.NAME_EN",
    fill_color="Blues",
).add_to(m)
addTooltip(m)
display(m)
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook

Length of Maternity Leave

The first map is a map of the length of maternity leave (in days) in different countries. The data is represented by a scale of light pink to dark red. A short maternity leave is represented by the light pink, while dark red represents a long maternity leave. From just looking at the map, one can tell that most of the countries have quite short paid maternity leaves. In fact, there are 11 countries that have 0 days of paid maternity leave, including the US. The longest maternity leave is San Marino (a small country surrounded by Italy) with 635 days.

Female Labor Force Participation

The second map is a map of the female labor force participation in different countries. The percentages are represented by a scale of light blue to dark blue. A low percentage (light blue) means that of all the female population ages 15+, a small amount of them participate in the labor force. Similarly, a high percentage (dark blue) means that a large amount of women ages 15+ participate in the labor force. There is no discernible pattern to the percentage of female labor force participation. However, what we can see is that the participation rate does not reach 100% for any country, even the ones classified as ‘high income’, such as the US. In fact, the highest participation rate is San Marino at 87.59%. The lowest is Yemen, at 6.04%. The US is around the middle at 56.79%.

Connection

There is no immediately discernible connection between the length of paid maternity leave and high female labor force participation. However, when looking at the countries that have the highest participation rate, there is some connection. Specifically, it is interesting that San Marino has both the highest labor force participation rate as well as the longest paid maternity leave. Unfortunately, these maps do not show much of a correlation between the length of paid maternity leave and female labor force participation. Thus, there must be other factors that affect female participation in the labor force.

Effects of Contraceptive Prevalence on Female Labor Force in US

Similar to my intuition about the effect of maternity leave on female labor force participation, I was also curious about the effects of contraception on women in the labor force. Access to contraception allows women more freedom in their lives, not being beholden to the complications of unwanted pregnancies. I would assume that with the increase of contraception, more women would begin working. In the following graph, I explore the effects of contraception prevalence on female participation in the labor force.

#Filter and merge datasets to only include relevant data
contra = health_df[health_df['Indicator Name'] == 'Contraceptive prevalence, any modern method (% of married women ages 15-49)']
labor = employ_df[employ_df['Indicator Name'] == 'Labor force participation rate, female (% of female population ages 15+) (national estimate)']
men = employ_df[employ_df['Indicator Name'] == 'Labor force participation rate, male (% of male population ages 15+) (national estimate)']


df = pd.concat([contra, labor])
df = pd.concat([df, men])
df = df[(df['Country Name'].eq('United States'))]

#Create and display line graph using plotly
fig = px.line(df, x="Year", y="Value", color='Indicator Name', hover_name="Indicator Name", 
              labels={
                     "Indicator Name": "Factor",
                     "Value": "% of population"
                 },)
fig.update_layout(legend=dict(
    orientation="h",
    yanchor="bottom",
    y=1.02,
    xanchor="right",
    x=1
))
fig.show()

This graph compares the evolution over time of female labor force participation in the US to the contraceptive prevalence in the US. We can see some individual patterns for each factor. For contraceptive prevalence, there is a sharp increase in the late 1960s and early 1970s. However, there is a sharp decline in the mid 1970s, eventually stabilizing from 1980 until now. For female labor force participation, there is a slow increase from 1960 to around 2000 where it peaks, before beginning a slight decine until now.

Thus at first glance, we can see that there was a correlation between the two factors, at least in the 1960s and 1970s. However, now there does not seem to be a connection, as even when contraceptive prevalence had a slight peak in 2016, the female participation rate essentially stayed the same.

Originally, I had only included the female labor force participation rate in this graph. However, as I was writing up my analysis, I found that I was curious about the comparison between the male and female labor force participation rate. When accounting for that, the connection between contraception prevalence and the female participation rate becomes stronger. This is because it is clear that the male labor force participation rate was not affected at all by the change in contraceptive prevalence.

Note: the data collected for contraceptive prevalence was taken from the population of married women from ages 15-49. Thus, there might be a different conclusion to glean if we had access to this same data from the population of all women.

Female Employment in India and the US

Another facet of employment is leadership opportunities available for women. I wanted to not only explore the relationship between female leaders and overall female employment, but I also wanted to explore it further. To carry on the theme of following policies surrounding maternal health and discrimination, I found countries with differences in their policies surrounding pregnant women. Two such countries are India and the US. While they are similar in terms of political systems, the US prohibits the dismissal of pregnant workers, while India doesn’t. In the following graph, I explore the evolution of female labor force participation as well as female presence in management over time in both the US and India.

#Filter and merge datasets to only include relevant data
dismiss = employ_df[employ_df['Indicator Name'] == 'Dismissal of pregnant workers is prohibited (1=yes; 0=no)']
dismiss = dismiss[['Country Name', 'Year', 'Value']].copy()
dismiss.columns = ['Country Name', 'Year', 'Dismissal of pregnant workers is prohibited (1=yes; 0=no)']

female = employ_df[employ_df['Indicator Name'] == 'Labor force, female']
female = female[['Country Name', 'Year', 'Value']].copy()
female.columns = ['Country Name', 'Year', 'Labor force, female']

manage = employ_df[employ_df['Indicator Name'] == 'Female share of employment in senior and middle management (%)']
manage = manage[['Country Name', 'Year', 'Value']].copy()
manage.columns = ['Country Name', 'Year', 'Female share of employment in senior and middle management (%)']

df = female[['Country Name', 'Year']].copy()
df = pd.merge(df, female, on = ['Country Name',  'Year'], how = "inner")
df = pd.merge(df, dismiss, on = ['Country Name',  'Year'], how = "inner")
df = pd.merge(df, manage, on = ['Country Name',  'Year'], how = "inner")
#Create and display line graph using plotly
df1 = df[(df['Country Name'] == "United States")]

subfig = make_subplots(specs=[[{"secondary_y": True}]])

fig1 = px.line(df1, x="Year", y='Labor force, female', title="US")
fig1.update_traces(yaxis="y2")

fig2 = px.line(df1, x="Year", y='Female share of employment in senior and middle management (%)')

subfig.add_traces(fig1.data + fig2.data)
subfig.layout.xaxis.title="Year"
subfig.layout.yaxis.title="% of Women in Management"
subfig.layout.yaxis2.type="log"
subfig.layout.yaxis2.title="Female Labor Force (log value)"
subfig.for_each_trace(lambda t: t.update(line=dict(color=t.marker.color)))
subfig.update_layout(title_text="United States")
subfig.show()

#Create and display line graph using plotly
df2 = df[(df['Country Name'] == "India")]

subfig = make_subplots(specs=[[{"secondary_y": True}]])

fig1 = px.line(df2, x="Year", y='Labor force, female')
fig1.update_traces(yaxis="y2")

fig2 = px.line(df2, x="Year", y='Female share of employment in senior and middle management (%)')

subfig.add_traces(fig1.data + fig2.data)
subfig.layout.xaxis.title="Year"
subfig.layout.yaxis.title="% of Women in Management"
subfig.layout.yaxis2.type="log"
subfig.layout.yaxis2.title="Female Labor Force (log value)"
subfig.for_each_trace(lambda t: t.update(line=dict(color=t.marker.color)))
subfig.update_layout(title_text="India")
subfig.show()

These graphs show the evolution of female labor force and female share of management. The female share of management (red) is recorded as the percentage of women of the total employment in senior and middle management. The female labor force (blue) is recorded as the number of women participating in the labor force. As the population numbers for both these countries are quite large, I had to graph the female labor force values on a logarithmic scale.

At first glance, both of these graphs look quite different. However upon further inspections, there are some insights to be gleaned from these graphs.

For the US, the evolution of female labor force participation and female share of management seem to be changing at a similar rate. Both seem to be increasing pretty steadily over time. However, for India, both seem to be changing at different rates. From 2010 to 2018, while the percentage of women in management is increasing, the female labor force is decreasing. There does not seem to be as clear a pattern as there was for the US.

However, if we look at the specific values, we can learn something intriguing. For female management, India at its highest (17.3% in 2020) is still much less than the US at its lowest (36.9% in 2006). Thus, the dismissal policies for pregnant women might have had a general impact on women in management. However, there is not enough information to ascribe that as the cause for the discrepancies between these countries.

For reference the population of India in 2021 was 1.408 billion, while the population of the US in 2021 was 331.9 million. This means that of the entire population of India, at its largest only about 12% are women in the workforce, while in the US, 23% of the population are women in the workforce. Thus, this could mean that there are other factors than anti-discrimination policies that have an impact on female employment (for both the labor force and female management).

Note: there were severe limiting factors for this visualization. With both the data given in the dataset, and the tools I had access to, I was not able to graph this visualization in as clear a manner as I would have wanted. In addition, I would have wanted to compare these values for a single country that changed its dismissal policies, but that data was not available in the dataset. With more information, perhaps a more significant conclusion could have been drawn.

Discussion

Overall, throughout this project, I aimed to answer my question: What factors play a role in the differences in female and male employment? In the process of doing so, I found myself going down two different paths of analysis.

I first looked at the difference in industry of men and women based on income class. I discovered that income does in fact have an effect on the sector in which both men and women are employed in. In addition, I also noticed that women are employed in the service industry more than men are. This is not insignificant, as it shows that while income does have an impact on the differences in female and male employment, there are still some discrepancies that persist regardless of income.

As I continued to explore the data, I found another path that piqued my interest. I started analyzing the impact of women’s health on female employment. The last three of my visualizations show that analysis. I compared the effects of maternity leave, contraceptive prevalence, and anti-discrimination policies against pregnant women on female employment. While I did not find definite correlations, I found enough to suggest that there is a not insignificant impact on female employment.

When exploring the effects of maternity leave on female employment, a particularly intereresting discovery was made. I noticed that the country with the longest paid maternity leave also had the highest female labor force participation rate. Unfortunately, that pattern did not seem to persist throughout the rest of the countries, but it was interesting to note. In general, it seemed that there was no real correlation between the length of paid maternity leave and female labor force participation. However, this could just mean that there are other more significant factors that impacts female employment.

I also explored the effects of anti-discrimination policies on female employment. Specifically, I looked at the effects of the prohibition of dismissal of pregnant women on the female labor force and female management in the US and India. I chose those two countries, because the US prohibits the dismissal of pregnant women while India does not. As mentioned above, I did not have enough data to consider different countries over a longer time period. There were some revealing conclusions, mainly the differences in female management. India, at its highest rate of female leadership, still had half as many female managers as the US did at its lowest. As interesting as that is, as I mentioned above, without more information, we cannot draw the conclusion that the anti-discrimination policies had this effect on female management. Thus, there must be other factors that impact female employment.

The most significant connection I noticed was the correlation between contraceptive prevalence and female labor force participation. I looked at the evolution of contraception prevalence, as well as female participation in the labor force, in the US from the 1960s to now. There was a clear correlation, as both contraception and the participation rate had a dramatic increase in the late 1960s to mid 1970s. Historically, this aligns with a time of great civil unrest and political strife. Namely, it was a time when women were protesting both for their right to contraception as well as their right to work. However, as we get closer to modern times, the connection becomes less distinct.

It must be noted that the data I had to work with was quite large, but also quite inaccessible. Although I tried, I could not find accessible texts that clearly explained how the data was collected. It was mentioned that the data was collected through surveys, but no further information was given on how these surveys were created and whether or not they reached all the people that these questions can apply to.

In conclusion, the factors that I chose to focus on, while of interest to me, do not seem have a significant impact in the discrepancies between female and male employment. This topic is quite a broad one, and it is clear that there are many more factors that have an impact on female employment.